-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default_qsub_arguments -V not working #781
Conversation
e2bf2a7
to
a7bc129
Compare
src/cmds/qsub.c
Outdated
|
||
/* qsub_envlist might be needed if -V option is set in default flags. | ||
* It will be unset if not needed then. */ | ||
qsub_envlist = env_array_to_varlist(envp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to call the function env_array_to_varlist() only when desired i.e. when V_opt is set either while calling qsub or through default_qsub_arguments.
In other words before calling the function set_job_env() make sure that V_opt(This should be set only when qsub is called through -V option or default_qsub_arguments contains -V option) is set or not. If set then only call env_array_to_varlist to populate qsub_envlist and then pass it to set_job_env()
Also we need to write a few PTL test cases to verify the fix. Following are the test cases that I can think of right now.
If required all the above test cases can be clubbed into a single test case. |
""" | ||
os.environ["SET_IN_SUBMISSION"] = "true" | ||
self.server.manager(MGR_CMD_SET, SERVER, {'default_qsub_arguments': '-V'}) | ||
j = Job(ROOT_USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do need to submit job as root user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to submit the job through a single user only, need not be a root user.
This is needed because the sample environment variable which is set will not be accessed by the job running on other users. The sample environment variable is set only for the current user.
"\nexit 0" + | ||
"\nfi" ) | ||
jid = self.server.submit(j) | ||
self.server.log_match(jid + ";Exit_status=0", id=jid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing new line.
j = Job(ROOT_USER) | ||
j.create_script("#PBS -N default_qsub_arguments" | ||
"\n#PBS -V" + | ||
"\nif [ -z '$SET_IN_SUBMISSION' ]" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In if [ -z '$SET_IN_SUBMISSION' ]
, I doubt that this SET_IN_SUBMISSION will get expanded because it is in single quote
5adfc52
to
db6e3d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the suggested code changes. It looks much better now. Here are a few more comments on PTL.
This Test case exports an environment variable. It sets | ||
-V option through a job script directive and then changes | ||
the value of the environment variable. It expects another job | ||
script to have the changed value of the environment varible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of saying another job script we can say that "The job under execution should get the updated value of the environment variable" since its(Job Script Directives) precedence is higher than the others"
script to have the changed value of the environment varible. | ||
""" | ||
os.environ["SET_IN_SUBMISSION"] = "true" | ||
j = Job(ROOT_USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ROOT_USER you can actually use one of TEST_USER variables.
os.environ["SET_IN_SUBMISSION"] = "true" | ||
self.server.manager(MGR_CMD_SET, SERVER, | ||
{'default_qsub_arguments': '-V'}) | ||
j = Job(ROOT_USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ROOT_USER you can actually use one of TEST_USER variables.
'SET_IN_SUBMISSION=%s' % env_var)}, | ||
id=jid) | ||
|
||
def test_option_V_with_script(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to rename the test case as test_qsub_opt_V_with_jobscript().
Thanks for all the suggestions and comments. I have updated the PTL test and tested it on the test machine. I have attached the log file for the same. |
src/cmds/qsub.c
Outdated
@@ -4914,6 +4914,10 @@ do_submit(char *retmsg) | |||
if (rc != 0) | |||
return (rc); | |||
} | |||
|
|||
/* Get required environment variables */ | |||
if (V_opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curios to know: Can you do little qsub performance test with large environment, before and after this change? I am suspecting that by doing this it will slow down qsub. (you use below script to do test)
#!/bin/bash
qmgr -c "s s default_qsub_arguments='-V'"
for i in $(seq 1 1 100)
do
export TEST_VAR${i}="$(printf "${i}%.0s" {1..1000})"
done
start=$(date +'%s')
for i in $(seq 1 1 100)
do
qsub -- /bin/sleep 1000
done
end=$(date +'%s')
echo $((end - start))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said yesterday during our offline discussion it is not good to compare qsub performance with large environment before and after the current code fix. Prior to this code change the function env_array_to_varlist() was not getting called when we set default_qsub_arguments and hence the current bug.
As part of the fix this function call is reintroduced at the right place. So if we compare with and without code fix the performance of qsub after the fix is expected to slow down with -V option. So the ideal test case for performance measure would be to compare qsub with a older version where this bug was not present vs with the current code fix. @sakshamgarg is verifying this test case and will be posting the results and logs to the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-thelkar Thanks for update, I was not aware that this issue was regression issue. So now I am fine if we don't want to do performance measure as with -V qsub will be slow is expected, but still up to you and @sakshamgarg for doing perf measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what @suresh-thelkar has suggested in the second part. I did the qsub performance test, between the older version where this bug was not there and the current code fix which has been done, using the above python script. There is no significant change in the performance test.
I have done manual testing for the test case which uses a job script to set -V option as I was facing user permission issue while trying to automate the test. This test case exports an environment variable. It sets -V option through a job script directive and then changes the value of the environment variable. It expects the job under execution to get the updated value of the environment variable since its (Job Script Directive's) precedence is higher than the others. Below is the log for manual testing and I have attached the script files. [hadoop@centos2 ~]$ export EXAMPLE=true |
@sakshamgarg, Thanks for making all the suggested changes and PTL test cases. I sign off. |
917a4ee
to
83b77eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I don't think this is right fix as I did quick test as below:
#!/bin/bash
qmgr -c "s s default_qsub_arguments = '-V'"
export EXAMPLE=true
qsub -- /bin/sleep 1000
export EXMAPLE=false
qsub -- /bin/sleep 1000
Means for first job EXAMPLE should be true and for second it should be false
But with this fix, EXAMPLE is always (atleast for 60 sec) true
[root@testdev pbspro]# qstat -f | grep 'EXAMPLE'
/pbs/bin:/opt/pbs/sbin,PWD=/pbspro,EXAMPLE=true,HOME=/root,SHLVL=2,
/pbs/bin:/opt/pbs/sbin,PWD=/pbspro,EXAMPLE=true,HOME=/root,SHLVL=2,
I know this happens because of background qsub daemon has old environ when it was forked.
So we need to look for another (right one) fix.
Good catch @hirenvadalia. It looks like qsub's environment is not passed to qsub daemon that we launch internally. Whenever a request is made to qsub daemon we should pass the new environment to it if V_opt is set. If passing this environment turns out to be a costly operation with respect to performance then we can also think of not redirecting the request to qsub daemon when V_opt is set. |
Below is the log after fixing the issue as reported by @hirenvadalia. [saksham@centos ~]$ qmgr -c "s s default_qsub_arguments='-V'" I am also attaching the logs for the manual testing as described in previous comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making all the code changes. I sign off.
validate default_qsub_argument option. | ||
""" | ||
|
||
def test_option_V(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test for recently discovered use cases (EXAMPLE=true and false one)
|
||
class TestDefaultQsubArgument(TestFunctional): | ||
""" | ||
validate default_qsub_argument option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_qsub_argument
== default_qsub_arguments
Also I suggest this docstring to something like this: Validate whether qsub sends all environ when -V is enabled via default_qsub_arguments and or on command line.
env_var = os.environ.get("SET_IN_SUBMISSION") | ||
self.server.expect(JOB, {'Variable_List': (MATCH_RE, | ||
'SET_IN_SUBMISSION=%s' % env_var)}, | ||
id=jid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test case for testing passing full environ via -V on qsub command line
os.environ["SET_IN_SUBMISSION"] = "true" | ||
self.server.manager(MGR_CMD_SET, SERVER, | ||
{'default_qsub_arguments': '-V'}) | ||
j = Job(PBSROOT_USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are setting SET_IN_SUBMISSION
in current user's session and submitting job as pbsroot user, which internally PTL will do sudo qsub, and SET_IN_SUBMISSION
might not get passed by sudo.
I suggest to submit job as current user's only, you can use self.du.get_current_user()
to get current user name
from tests.functional import * | ||
|
||
|
||
class TestDefaultQsubArgument(TestFunctional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to put these test cases in already existing test suite under https://github.com/PBSPro/pbspro/blob/master/test/tests/functional/pbs_passing_environment_variable.py
src/cmds/qsub.c
Outdated
if (V_opt) | ||
qsub_envlist = env_array_to_varlist(envp); | ||
|
||
qsub_envlist = env_array_to_varlist(envp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling env_array_to_varlist in every condition with definitely decrease performance of qsub even with out -V because env_array_to_varlist is little heavy function which traverse whole environ char by char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your statement that it will affect performance of qsub.
The other alternative solution to this problem was to connect to the server at the very starting, before it was loading value to qsub_envlist in the "main" function, and fetch the default qsub arguments from the server. But this renders the use of a daemon pointless.
Although this result in performance degradation but it is the only (valid) solution for this bug.
If needed, we can enforce that -V option never be set in default arguments to ensure performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sakshamgarg Another solution might be possible which is let qsub daemon ask foreground qsub to send envp is -V is enabled in default_qsub_arguments on server.
Flow will be something like this:
- foreground qsub send all data to daemon qsub except envp if no -V on command line, if -V is passed on qsub command line then pass envp along with another data
- daemon qsub will accept data
- if -V is enabled in default_qsub_arguments then ask foreground to send envp if not already send
- daemon qsub will accept envp from foreground qsub
- daemon qsub will submit job
This solution will only qsub performance only in case of -V only which is expected but it doesn't affect performance normal qsub and still we can have daemon qsub
As discussed, the easiest (and right) solution would be to drop backgrounding in case a -V is detected in the qsub default arguments (or qsub command line). Passing on large environments from froreground to background qsubs would be performing bad enough that we might as well do a implied "-f" or direct connect to server (a.k.a no backgrounding) |
2183114
to
b496281
Compare
src/cmds/qsub.c
Outdated
@@ -164,6 +164,7 @@ | |||
#define QSUB_DMN_TIMEOUT_LONG 60 /* timeout for qsub background process */ | |||
#define QSUB_DMN_TIMEOUT_SHORT 5 | |||
|
|||
#define VOPT_ERROR 4321 /* Error code if -V option is detected in background qsub*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having option 'V' is not an error. May be we can rename it like VOPT_SET.
src/cmds/qsub.c
Outdated
@@ -4877,6 +4878,28 @@ do_connect(char *server_out, char *retmsg) | |||
return 0; | |||
} | |||
|
|||
static int | |||
parse_dfltqsubargs(char *retmsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write the doxygen header for this function.
src/cmds/qsub.c
Outdated
*/ | ||
refresh_dfltqsubargs(); | ||
rc = parse_dfltqsubargs(retmsg); | ||
if(rc || V_opt){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make it more readable by saying rc == 0 && V_opt.
src/cmds/qsub.c
Outdated
*/ | ||
*do_regular_submit = 0; | ||
if(rc != VOPT_ERROR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking whether VOPT_ERROR is not set, do a check for rc == VOPT_ERROR, if yes bypass the following code. Basically you need to optimise the code here.
@sakshamgarg the server forces the background daemons to update the default qsub args (When it changes via qmgr), this code is already there. We need to check it after do_dir() in do_submit() and return with a "Unique" error code. This should percolate all the way back to the foreground qsub as a submission return code (only that it is not), and we need to check that in the foreground and set do_forground_qsub=1 |
src/cmds/qsub.c
Outdated
@@ -5982,6 +5961,8 @@ fork_and_stay(void) | |||
/* set single threaded mode */ | |||
pbs_client_thread_set_single_threaded_mode(); | |||
|
|||
/* set when background qsub is running */ | |||
is_background = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
src/cmds/qsub.c
Outdated
@@ -236,6 +236,7 @@ static char *pbs_hostvar = NULL; /* buffer containing ",PBS_O_HOST=" and host na | |||
static int pbs_o_hostsize = sizeof(",PBS_O_HOST=") + 1; /* size of prefix for hostvar */ | |||
static char *display; /* environment variable DISPLAY */ | |||
|
|||
static int is_background = 0; /* flag to check if background qsub is active */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the comment to something like "am i the background process? This variable is set only once and is read-only afterwards"
src/cmds/qsub.c
Outdated
/* | ||
* get environment variable if -V option is set. Return the code | ||
* VOPT_SET if -V option is detected in background qsub. | ||
*/ | ||
if(V_opt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before (
src/cmds/qsub.c
Outdated
if(V_opt) { | ||
if(is_background) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before (
src/cmds/qsub.c
Outdated
rc = -1; | ||
sprintf(retmsg, "Failed to recv data from background qsub\n"); | ||
/* Error message will be printed in caller */ | ||
if(rc != VOPT_SET){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before (
src/cmds/qsub.c
Outdated
@@ -164,7 +164,7 @@ | |||
#define QSUB_DMN_TIMEOUT_LONG 60 /* timeout for qsub background process */ | |||
#define QSUB_DMN_TIMEOUT_SHORT 5 | |||
|
|||
#define VOPT_ERROR 4321 /* Error code if -V option is detected in background qsub*/ | |||
#define VOPT_SET 4 /* return code if -V option is detected in background qsub*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably best to rename this return code to be more generic, so that it can be used later for other similar purposes, if necessary. Rename it to something like DMN_CANT_SERVE (or think something better)
src/cmds/qsub.c
Outdated
if(V_opt) { | ||
if(is_background) | ||
return VOPT_SET; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be better if we can check this in do_daemon_stuff and exit the daemon in this return case. This way future qsubs will not have to continuously consult with the background qsub just to know that it should not be going to the background
src/cmds/qsub.c
Outdated
@@ -6246,9 +6246,8 @@ main(int argc, char **argv, char **envp) /* qsub */ | |||
basic_envlist = job_env_basic(); | |||
if (basic_envlist == NULL) | |||
exit_qsub(3); | |||
if(V_opt) { | |||
if(V_opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still has an indentation problem
src/cmds/qsub.c
Outdated
|
||
/* | ||
* Something bad happened, either background submitted | ||
* and failed to send us response, or it failed before | ||
* submitting. If background qsub detects -V option, then | ||
* submit the job through foreground. | ||
*/ | ||
if(rc != VOPT_SET){ | ||
if (rc != DMN_REFUSE_EXIT){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still has a indentation issue between ){
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the code changes. You can implement optional minor comment that I have given if you would like to. However I sign off.
@@ -4915,6 +4922,16 @@ do_submit(char *retmsg) | |||
return (rc); | |||
} | |||
|
|||
/* | |||
* get environment variable if -V option is set. Return the code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you repharse it as "get the variable environ which points to an array of pointers to strings of the user environment if -V option is set....".
Attaching the logs of PTL tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c27c79c
to
28b20e8
Compare
Not starting background qsub if -V option is set
28b20e8
to
7847e71
Compare
@subhasisb I have squashed all commits and rebased. |
Bug/feature Description
Affected Platform(s)
Cause / Analysis / Design
Solution Description
Testing logs/output
Checklist:
For further information please visit the Developer Guide Home.